Distributed-Memory Breadth-First Search on Massive Graphs

نویسندگان

Aydin Buluç

Scott Beamer

Kamesh Madduri

Krste Asanovic

David A. Patterson

چکیده

In this chapter, we study the problem of traversing large graphs. A traversal, a systematic method of exploring all the vertices and edges in a graph, can be done in many different orders. A traversal in “breadth-first” order, a breadth-first search (BFS), is important because it serves as a building block for many graph algorithms. Parallel graph algorithms increasingly rely on BFS for exploring all vertices in a graph because depth-first search is inherently sequential. Fast parallel graph algorithms often use BFS, even when the optimal sequential algorithm for solving the same problem relies on depth-first search. Strongly connected component decomposition of a graph [27, 32] is an example of such an computation. Given a distinguished source vertex s, BFS systematically explores the graph G to discover every vertex that is reachable from s. In the worst case, BFS has to explore all of the edges in the connected component that s belongs to in order to reach every vertex in the connected component. A simple level-synchronous traversal that explores all of the outgoing edges of the current frontier (the set of vertices discovered in this level) is therefore considered optimal in the worst-case analysis. This level-synchronous algorithm exposes lots of parallelism for low-diameter (small-world) graphs [35]. Many real-world graphs, such as those representing social interactions and brain anatomy, are known to have small-world characteristics. Parallel BFS on distributed-memory systems is challenging due to its low computational intensity and irregular data access patterns. Recently, a large body of optimization strategies have been designed to improve the performance of parallel BFS in distributed-memory systems. Among those, two major techniques stand out in terms of their success and general applicability. The first one is the direction-optimization by Beamer et al.[4] that optimistically reduces the number of edge examinations by integrating a bottom-up algorithm into the traversal. The second one is the use of two-dimensional (2D) decomposition of the sparse adjacency matrix of the graph [6, 37]. In this article, we build on our prior distributed-memory BFS work [5, 6] by expanding our performance optimizations. We generalize our 2D approach to arbitrary pr × pc rectangular processor grids (which subsumes the 1D cases in its extremes of pr = 1 or pc = 1). We evaluate the effects of in-node multithreading for performance and scalability. We run our experiments on three different platforms: a Cray XE6, a Cray XK7 (without using co-processors), and a Cray XC30. We compare the effects of using a scalable hypersparse representation called Doubly Compressed Sparse Column (DCSC) for storing local subgraphs versus a simpler but less memory-efficient Compressed Sparse Row (CSR) representation. Finally, we validate our implementation by using it to efficiently traverse a real-world graph.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Breadth First Search on Massive Graphs

We consider the problem of Breadth First Search (BFS) traversal on massive sparse undirected graphs. Despite the existence of simple linear time algorithms in the RAM model, it was considered non-viable for massive graphs because of the I/O cost it incurs. Munagala and Ranade [29] and later Mehlhorn and Meyer [27] gave efficient algorithms (refered to as MR BFS and MM BFS, respectively) for com...

متن کامل

From Distributed Memory Cycle Detection to Parallel LTL Model Checking

In [2] we proposed a parallel graph algorithm for detecting cycles in very large directed graphs distributed over a network of workstations. The algorithm employs back-level edges as computed by the breadth first search. In this paper we describe how to turn the algorithm into an explicit state distributed memory LTL model checker by extending it with detection of accepting cycles, counterexamp...

متن کامل

Graph partitioning for scalable distributed graph computations

Inter-node communication time constitutes a significant fraction of the execution time of graph algorithms on distributed-memory systems. Global computations on large-scale sparse graphs with skewed degree distributions are particularly challenging to optimize for, as prior work shows that it is difficult to obtain balanced partitions with low edge cuts for these graphs. In this work, we attemp...

متن کامل

Distributed Graph Layout for Scalable Small-world Network Analysis

The in-memory graph layout or organization has a considerable impact on the time and energy efficiency of distributed memory graph computations. It affects memory locality, inter-task load balance, communication time, and overall memory utilization. Graph layout could refer to partitioning or replication of vertex and edge arrays, selective replication of data structures that hold meta-data, an...

متن کامل

A Note on (Parallel) Depth- and Breadth-First Search by Arc Elimination

This note recapitulates an algorithmic observation for ordered Depth-First Search (DFS) in directed graphs that immediately leads to a parallel algorithm with linear speed-up for a range of processors for non-sparse graphs. The note extends the approach to ordered Breadth-First Search (BFS). With p processors, both DFS and BFS algorithms run in O(m/p + n) time steps on a shared-memory parallel ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1705.04590 شماره

صفحات -

تاریخ انتشار 2015

Distributed-Memory Breadth-First Search on Massive Graphs

نویسندگان

چکیده

منابع مشابه

Breadth First Search on Massive Graphs

From Distributed Memory Cycle Detection to Parallel LTL Model Checking

Graph partitioning for scalable distributed graph computations

Distributed Graph Layout for Scalable Small-world Network Analysis

A Note on (Parallel) Depth- and Breadth-First Search by Arc Elimination

عنوان ژورنال:

اشتراک گذاری